feat: changes for Bare metal Ai tier release by kupratyu-splunk · Pull Request #79 · splunk/splunk-ai-operator

kupratyu-splunk · 2026-03-20T18:42:13Z

Description

Related Issues

Related to #

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test improvement
CI/CD improvement
Chore (dependency updates, etc.)

Changes Made

Testing Performed

Unit tests pass (make test)
Linting passes (make lint)
Integration tests pass (if applicable)
E2E tests pass (if applicable)
Manual testing performed

Test Environment

Kubernetes Version:
Cloud Provider:
Deployment Method:

Test Steps

Documentation

Checklist

Breaking Changes

Impact:

Migration Path:

Screenshots/Recordings

Additional Notes

Reviewer Notes

Please pay special attention to:

Commit Message Convention: This PR follows Conventional Commits

…0 to 1.40.0

Version 3.0.0 does not exist in the splunk helm repo; 3.1.0 is the latest available. Also regenerates Chart.lock with correct digest.

The splunkai_models_apps package no longer exists in ai-platform-models. The ray applications are now resolved relative to their working_dir zip, so import paths should be bare module names (main:SERVE_APP / main:create_serve_app).

…ersion into ApplicationParams Without working_dir, Ray has no zip to load main from and fails with 'No module named main'. Added WorkingDirBase and ModelVersion fields to ApplicationParams, computed from object storage path and MODEL_VERSION env var, and templated working_dir into all 13 app entries in applications.yaml.

…b_storage Two bugs causing NoSuchBucket when Ray downloads working_dir zips: 1. rayS3DownloadEnv() was missing AWS_S3_ADDRESSING_STYLE=path. Boto3 defaults to virtual-hosted style (bucket.endpoint) for custom endpoints, which fails DNS resolution with MinIO. Path-style (endpoint/bucket/key) is required for all S3-compatible stores. 2. applications.yaml used 'object_storage' as the model_loader sub-field but ModelLoader in model_definition.py defines it as 'blob_storage' (renamed in commit e62d93da). Pydantic silently ignored the unknown key, leaving blob_storage=None and causing a model validation error at startup.

…handler Ray's s3:// protocol handler (protocol.py _handle_s3_protocol) creates a plain boto3.Session().client('s3') with no endpoint_url, so it always hits AWS S3 regardless of AWS_ENDPOINT_URL set on the pod. This causes NoSuchBucket when the bucket only exists in MinIO. Replace rayRuntimeWorkingDirScheme() with rayWorkingDirBase() which, for S3-compatible stores with a custom endpoint, builds the working_dir as a direct HTTP URL to MinIO (endpoint/bucket/path). Ray's https handler uses urllib which simply fetches the URL without any S3-specific boto3 logic. Also remove the ineffective AWS_S3_ADDRESSING_STYLE env var added in the previous commit.

…nIO zips Ray's s3:// protocol handler creates a bare boto3.Session().client('s3') with no endpoint_url, so it always hits AWS S3 regardless of any custom endpoint config. Rather than fighting Ray internals, switch to file:// working_dir pointing to app source baked into the Ray image. - applications.yaml: replace all 'minio-zip' working_dir templates with file:///home/ray/ray/applications/entrypoint (Entrypoint) and file:///home/ray/ray/applications/generic_application (all other apps) - builder.go: remove WorkingDirBase, ModelVersion fields and rayWorkingDirBase() function — no longer needed since working_dir is a static file:// path - builder_test.go: remove TestRayWorkingDirBase test for deleted function

…ote URL for others PromptInjectionTfidf, PromptInjectionCrossEncoder, PromptInjectionClassifier are baked into the Ray worker image at /home/ray/ray/applications/generic_application, so they use file:// working_dir with no network dependency. All other apps (UaeLarge, AllMinilmL6V2, BiEncoder, MbartTranslator, etc.) continue to use {{.WorkingDirBase}}/AppName-{{.ModelVersion}}.zip resolved at runtime from the configured object storage (s3, gs, azure, or s3compat/MinIO endpoint).

…cile - saia/impl.go: bump default memory request 1Gi->2Gi, limits CPU 1->2 / memory 2Gi->4Gi to prevent kubelet OOMKill during SAIA startup - reconciler.go: preserve existing AIService Resources on reconcile so user-set limits are not wiped back to defaults on every AIPlatform reconcile

Ray requires file:// working_dir URIs to point to a .zip or .whl file. Update the 3 prompt injection apps to reference generic_application.zip which is built during the Docker image build in ai-platform-models.

…g_dir base All 13 Ray Serve apps now use the generic_application.zip baked into the Ray head image via file://, eliminating the need to upload versioned zips to MinIO. Also fixes rayWorkingDirBase to return s3:// for all S3-compatible backends (minio, seaweedfs, s3compat) so AWS_ENDPOINT_URL on the pods redirects boto3 to the MinIO endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…15.1)

MbartTranslatorDeployment hardcodes its blob_prefix and does not accept model_definition as an init arg, so passing it via .bind() caused a Ray pickle error.

Replace Llama31Instruct (8b) with GptOss20b and Llama3170bInstructAwq (70b) with GptOss120b. L40S-only, tool_parser: openai, VLLM_ATTENTION_BACKEND: TRITON_ATTN, 1 GPU for 20b and 4 GPUs for 120b.

…env override Actor-level runtime_env.env_vars in gpu_type_options_override replaces the app-level runtime_env, causing APPLICATION_NAME and other vars to be lost. Move VLLM_ATTENTION_BACKEND: TRITON_ATTN to the top-level env_vars instead.

…odel)

…r terminating nodes during model load Large model loading (e.g. gpt-oss-120b) takes several minutes. Without an idle timeout, the Ray autoscaler terminates worker nodes after 60s (default), killing the replica mid-load with SIGTERM.

…pec field WorkerGroupSpec.idleTimeoutSeconds is rejected as unknown by the installed KubeRay CRD version. AutoscalerOptions.IdleTimeoutSeconds is set at the cluster level and read directly by the Ray autoscaler process, achieving the same effect without requiring a CRD upgrade. 600s idle timeout prevents the autoscaler from terminating worker nodes while large models (e.g. gpt-oss-120b) are loading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The 120b model download/load exceeds the previous 50Gi ephemeral storage limit, causing pod eviction. 200Gi matches the storage needed for large model artifacts. Memory increased from 16Gi to 64Gi to support vLLM process memory requirements for a 120b model. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The model is mxfp4 quantized at 65GB on disk, requiring more than a single L40S (46Gi) GPU. Switch to num_gpus=2 / tensor_parallel_size=2 to use 2x L40S = 92Gi, comfortably fitting the model at runtime. Also increase l40s-1-gpu ephemeral-storage to 200Gi and memory to 64Gi to prevent pod eviction during large model downloads.

…Gi for gpt-oss-120b

…e to 200Gi for gpt-oss-120b" This reverts commit 03ef451.

…atorType - Add H100 worker tiers to instance.yaml (h100-0-gpu, h100-1-gpu) - Add H100 instanceScale block to features/saia.yaml - Add AcceleratorType field to ApplicationParams in builder.go, populated from effectiveAcceleratorType(), so applications.yaml can template gpu_types - Template gpu_types in applications.yaml for GptOss20b and GptOss120b using {{.AcceleratorType}} instead of hardcoded ["L40S"] - Add H100 gpu_type_options_override and gpu_type_model_config_override entries for GptOss20b (0.5 GPU, tp=1) and GptOss120b (1 GPU, tp=1) - Fix UaeLarge H100 num_gpus and gpu_memory_utilization: 0.025 -> 0.0375 - Require yq in preflight checks for eks and k0s scripts (fail instead of silently falling back to fragile grep/awk parsing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

eks_cluster_with_stack.sh: - Read GPU_CAPACITY_RESERVATION_ID/AZ and GPU_AVAILABILITY_ZONES from cluster-config.yaml in load_config() - generate_node_groups(): skip standard GPU node group for H100 with capacity reservation; add availabilityZones support for other types - New create_gpu_nodegroup_with_capacity_block(): CloudFormation-based H100 node group using CapacityType: CAPACITY_BLOCK, only invoked when defaultAcceleratorType=H100 and capacityReservation.id is set - create_cluster_flow/reconcile_flow: gate capacity block creation on DEFAULT_ACCELERATOR=H100, idempotent GPU node count check - main_install: export AWS_DEFAULT_REGION/AWS_REGION after load_config - Add missing --region flag to 3 eksctl create iamserviceaccount calls k0s_cluster_with_stack.sh: - load_config: read defaultAcceleratorType from config, default L40S cluster-config.yaml: - GPU TYPE QUICK REFERENCE comment block: L40S/H100/H100_NVL instance types, when to use capacityReservation and availabilityZones - H100-only capacityReservation and availabilityZones commented-out blocks - defaultAcceleratorType comment cross-referencing instance types k0s-cluster-config.yaml (new file): - Config template for k0s script with GPU TYPE QUICK REFERENCE - Documents L40S/H100/H100_NVL gpuWorker instance types alongside defaultAcceleratorType Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…26-25518 - google.golang.org/grpc: v1.78.0 → v1.79.3 (Critical: CVE-2026-33186) - github.com/cert-manager/cert-manager: v1.18.0 → v1.18.5 (Medium: CVE-2026-25518) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add GPUWorkerConfig to AIPlatformSpec CRD with instanceTypes and instanceScale fields, allowing new GPU types to be defined in cluster-config.yaml without rebuilding the operator image - Operator creates <name>-instances, <name>-feature-<name>, and <name>-applications ConfigMaps seeded from image defaults on first deploy; reads from them on every reconcile (filesystem fallback) - ReconcileInstancesConfigMap and ReconcileFeatureConfigMaps merge new GPU type keys from spec into ConfigMaps; existing keys never overwritten - ReconcileApplicationsConfigMap fixed (was using broken HOME path) - Uncomment InstancesConfigMap, FeatureConfigMaps, ApplicationsConfigMap reconcile stages; add FEATURE_CONFIG_DIR env var support - Add custom-gpu-accelerator.md guide documenting Path 1 (fresh install via gpuWorkerConfig) and Path 2 (existing cluster update)

- load_config() reads aiPlatform.gpuWorkerConfig from k0s-cluster-config.yaml using yq and indents it for embedding in the AIPlatform CR spec - install_ai_platform_cr() injects gpuWorkerConfig block when present, and also sets defaultAcceleratorType (was missing from k0s CR) - k0s-cluster-config.yaml: add commented gpuWorkerConfig example section matching the EKS cluster-config.yaml format

kupratyu-splunk and others added 30 commits March 20, 2026 23:58

changes for supporting minio in operator and script

f17b874

generi object storage changes

17252f4

changes for s3 compatable storage in operator

c9b8de3

vulnerability issue: version upgrade for opentelemetry-go from v1.33.…

f529ea8

…0 to 1.40.0

s3object storage changes

718a31c

fix: bump splunk-operator helm dependency from 3.0.0 to 3.1.0

6bb3a60

Version 3.0.0 does not exist in the splunk helm repo; 3.1.0 is the latest available. Also regenerates Chart.lock with correct digest.

fix: point file:// working_dir to .zip file not directory

2da6120

Ray requires file:// working_dir URIs to point to a .zip or .whl file. Update the 3 prompt injection apps to reference generic_application.zip which is built during the Docker image build in ai-platform-models.

fix: use entrypoint.zip for Entrypoint app working_dir

11798a1

fix: rename blob_storage prefix to blob_prefix to match SDK field name

f0b9785

fix: remove task: classify from engine_args (not supported in vllm 0.…

2d44244

…15.1)

fix: remove model_definition from MbartTranslator app config

977934e

MbartTranslatorDeployment hardcodes its blob_prefix and does not accept model_definition as an init arg, so passing it via .bind() caused a Ray pickle error.

feat: replace llama models with gpt-oss-20b and gpt-oss-120b

2d7c3b2

Replace Llama31Instruct (8b) with GptOss20b and Llama3170bInstructAwq (70b) with GptOss120b. L40S-only, tool_parser: openai, VLLM_ATTENTION_BACKEND: TRITON_ATTN, 1 GPU for 20b and 4 GPUs for 120b.

fix: reduce GptOss120b to 1 GPU / tensor_parallel_size 1 (quantized m…

a1a13fd

…odel)

fix: increase l40s-2-gpu memory to 128Gi and ephemeral-storage to 200…

03ef451

…Gi for gpt-oss-120b

Revert "fix: increase l40s-2-gpu memory to 128Gi and ephemeral-storag…

61bec1e

…e to 200Gi for gpt-oss-120b" This reverts commit 03ef451.

kupratyu-splunk and others added 3 commits March 31, 2026 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: changes for Bare metal Ai tier release#79

feat: changes for Bare metal Ai tier release#79
kupratyu-splunk wants to merge 33 commits intomainfrom
ai-tier-v2

kupratyu-splunk commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kupratyu-splunk commented Mar 20, 2026

Description

Related Issues

Type of Change

Changes Made

Testing Performed

Test Environment

Test Steps

Documentation

Checklist

Breaking Changes

Screenshots/Recordings

Additional Notes

Reviewer Notes

Please pay special attention to:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant